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Marginal Maximum Likelihood Estimation for a Psychometric 
Model of Discontinuous Development 

Abstract 

Standard item response theory (CRT) models posit latent variables to account for 
regularities in students' performances on test items. They can accommodate learning 
only if the expected changes in performance are smooth and, in an appropriate metric, 
uniform over items. Wilson's "Saltus" model extends the ideas of IRT to development 
that occurs in stages, where expected changes can be discontinuous, ',how different 
patterns for different types of items, and even exhibit reversals in probabilities of success 
on certain tasks. Examples include Piagetian stages of psychological development and 
Siegler's rule-based learning. This paper derives marginal maximum likelihood (MML) 
estimation equations for the structural parameters of the Saltus model and suggests a 
computing approximation based on the EM algorithm. For individual examinees. 
Empirical Bayes probabilities of learning-stage are given, along with proficiency 
parameter estimates conditional on stage membership. The MML solution is illustrated 
with simulated data and an example from the domain of mixed number subtraction. 

Key words: Cognitive diagnosis, empirical Bayes, item response theory, marginal 
maximum likelihood, mixture models, Saltus model 
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1.0 Introduction 

The models of classical test theory and item response theory (IRT) characterize 
examinees simply in terms of their propensities to make correct answers in a domain of 
items — that is, their overall proficiencies. Correspondingly, the processes and the 
outcomes of learning can be expressed through these models only as changes in overall 
proficiency. This characterization falls short for problems of description and decision- 
making cast in the framework of what we are learning about how people solve problems, 
acquire knowledge, and increase their proficiencies (Glaser, 1981; Masters & Mislevy, 
1993; Snow & Lohman, 1989). Learners become more competent not simply by 
accreting additional facts and skills, but by reconfiguring their previous knowledge, by 
"chunking" information to reduce memory loads, and by developing strategies and 
models that help them discern when and how facts and skills are relevant When 
evaluating or planning instraction, the important questions may not be "How many items 
did this student answer correctiy?" or "What proportion of the population would have 
scores lower than hers?", but, in Thompson's (1982) words, "What can this person be 
thinking so that his actions make sense from his perspective?" and "What organization 
does the smdent have in mind so that his actions seem, to him, to form a coherent 
pattern?" Taking this point of view, Glaser, Lesgold, and Lajoie (1987) advocate 
"achievement testing as ... a method of indexing stages of competence through indicators 
of the level of development of knowledge, skill, and cognitive process." 

Models that incorporate this perspective have begun to appear in the testing 
literature. Examples include Tatsuoka's (1983, 1990) extension of IRT to "rule space" 
through the use of cognitive task analyses, Embretson's (1985) and Samejima's (1983) 
models for alternative response strategies when subtask results can be observed, and 
Falmagne's (1989), Haertel's (1984), and Paulson's (1986) latent-class models built 
around the combinations of skills that tasks demand. 
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Wilson's (1984, 1989) "Saltus" model for le^jning that occurs in conceptual or 
developmental stages is another model of this type. Each subject is characterized by two 
variables, one qualitative and the other quantitative. The qualitative parameter, denoting 
stage membership, indicates the nature of proficiency, while the quantitative parameter 
indicates degree of proficiency. Although both types of parameters are unobservable, 
approximate solutions in early demonstrations of Saltus treated estimates of stage 
membership (based on total scores) as if they were known, true, parameter values, 
followed by "tailored simulations" to correct for some of the effects of this 
oversimplification. The solution offered in the present paper more properly accounts for 
the uncertaiixty associated with examinees' stage memberships, using Mislevy and 
Verhelst's (1990) empirical Bayesian approach for mixtures of test theory models. After 
reviewing the form of the Saltus model, we present marginal maximum likelihood 
(MML) estimation procedures and illustrate their use with simulated data and Tatsuoka's 
mixed number subtraction data (Klein, Birenbaum, Standiford, and Tatsuoka, 1981). 

2.0 The Saltus Model 

Wilson's (1984, 1989) Saltus model for hierarchical development generalizes the 
Rasch model for dichotomous test items O^asch, 1960/1980) by positing H 
"developmental stages." An examinee is assumed to be in exactly one stage at the time 
of testing, but stage membership is not directly observed. Items are also classified into H 
classes. It is assumed that a Rasch model holds within each developmental stage, and the 
relative distances between items within a given item class are the same irrespective of 
developmental stage. The relative difficulties among item classes may differ from one 
developmental stage to another, however. The amounts by which item class difficulties 
vary for different stages are the "Saltus parameters." Saltus parameters can capture how 
certain types of items become much easier relative to others as students reconceptualize a 
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domain or add a new rule to their repertoire, or how certain items can actually become 
harder as students progress from an earlier stage to a more advanced one if they were 
previously answered comecdy for the wrong reason. Wilson's (1989) illustrative 
examples concerned the development of children's proportional reasoning abilities, using 
balance-beam data collected by Siegler (1981), and the acquisition of subtraction rules in 
a Gagn^an learning hierarchy (see Gagn^, 1968). 

Anticipating MML estimation, we describe an estmmtion model in two phases. 
First is the Saltus item response model, which gives probabilities of correct response 
conditional on stage membership and proficiency. Second is a population model, which 
concerns the proportions of a population of examinees at each stage and the distributions 
of proficiency within stages. 

2. 1 The Saltus Item Response Model 

Saltus is an extension of the Rasch model (RM) for dichotomous test items. 

Under the RM, the probability that an examinee with proficiency 8 will respond correctiy 
to Item j (Xj=l rather than Xj=0) is given as 

P(xj=lie, ft)=^(e-|3j), (1) 
where is the difficulty parameter of Item j, and T is the cumulative logistic distribution 

function; that is, 

TO = exp(zy[l4€xp(z)]. a) 

Under Saltus, an examinee is characterized by not just a proficiency parameter 0, 

but also a stage membership parameter (j). If there are H potential developmental stages, 
^. = (<()jj, . . . , (})^), where takes the value of 1 if Examinee i is in Stage h and 0 if not. 

As with 9 , values of are not observable. 

Under Saltus, as under the RM, item j has a difficulty parameter Pj. Item j is also 

associated with developmental stages through the item-class indicator b-. In analogy to (j). 
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bj = (bjj, . . . , bjjj), where bjjj. takes the value of 1 if item j belongs to item Qass k, and 0 
otherwise. In contrast with <|t, however, bj is known a priori for all items. 

T = (Xjjjj.) is an H-by-H matrix of Saltus parameters. In particular, expresses 

an effect on the difficulty of items in Class k that applies to examinees in Stage h. The 
probability that an examinee with stage membership parameter <{> and proficiency 6 will 
respond correctly to item j is given as 

p(xj=i le, , Pj, T) = n n H'(e-Pj+xhk)'^hbjk. o) 

h k 

In the sequel, ^(^-pij-Ky^) wUl be abbreviated as ^jyjCQ)- Note that the double product 

over h and k in (3) is merely a device to pick up the appropriate Saltus parameter for item 
j that corresponds to the developmental stage of this particular examinee, since the 
exponent <t>jjbjk ^ otherwise. 

Item responses are assumed to be independent given 9 and ({>. Letting x - 
(xj, . . . , Xj^) be a vector of responses to n items, 

p( X 19, <>, p, T) = n n n {^jhkcef j[i-^jhk(e)]^^"''j^}*'^''jk . (4) 

j h k 

For brevity, we define 

Ph(x 1 9, Pj, T) = n n { Yjhk(0f j[i-^jhk(e)]^^"''j^}^jk ; 

j !<- 

Pjj(x 19, P, T), or Pj^(x 19) for short, is the conditional probability of a response pattern x 
given 9 and membership in Stage h. 



2.1.1 Restrictions to Resolve Scaling Indetenninacies 

The model defined in (3) is not identified unless further restrictions are imposed 
on item and Saltus parameters. This can be accomplished in several ways, but once 
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parameters have been estimated under one set of restrictions, it is straightforward to 
translate them to what they would be under a different set. The following restrictions 
prove convenient for MML estimation: 

i:pj=o, 

so that item parameters are centered around the origin; 

Ilk = 0 for all k, 

so that the item parameter estimates apply directiy to Stage 1 in a simple RM, but relative 
changes in item difficulties may apply for other stages via Saltus parameters; and 

Thi = 0 for all h, 

so that the item difficulty scale within each Stage h is set by restricting its Class 1 item 
difficulty parameters to be the same as those in Stage 1. Together, this system constitutes 
a necessaiy set of restrictions for identifying the model. An empirical check on the 
identification status of a Saltus model with a particular configuration of b's and a 
particular set of data is discussed in Section 3.3. 

2.1.2 A Special Case 

Wilson (1989) has discussed the case in which arrival in Stage h is signaled by a 
drop in the difficulty of items in item Class h, relative to items in all other classes. This 
difficulty shift is maintained in higher stages. This structure corresponds to a set of 
constraints among Saltus parameters: 

Thk = Oif h<k, 

and 

thk = '^h'k if both h>k and h'>k. 
In this case there are only H-1 unique values for Saltus parameters, wliich for 
convenience may be called simply t^, . . . , x^. 
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2.2 The Population Model 

For estimation purposes, we assume a population in which the proportion of 
examinees in each developmental Stage h is JCj^, with 0<7tjj<l. Denote by n the vector 

(7tj,...,7tjj). 

The density function of 6 for Stage h is denoted We shall discuss two 
special cases for g: a normal solution, wherein is distributed as N()J.jj, Oj^), and a 
(nearly) nonparametric approximation, wherein each gj^ is characterized as a histogram 

over a grid of prespecified points. The weight or density at point q for Stage h is denoted 
OJjjq. For generality, we use a to denote population density parameters. In the normal 

solution, a = (jij, Oi , . . . , Hj^, Ojj); in the nonparametric approximation, a = (Ojjq). 

3.0 Marginal Estimation of Structural Parameters 

Assuming the Saltus item response model, (4) is the conditional probability of a 
response pattern x. Assuming further the population model described above, the 
marginal probability of x, or the probability of observing x from an examinee selected at 
random from the population, is given as 

p(x) = p(x I p, T, %, a) 

= I %| Ph(x I e,p,T)gh(e I a)de . (5) 

Let X = (X|, . . . , Xj^j) be the response matrix of a sample of N examinees to n test items. 

A realization of X induces tiie marginal likelihood function for T, n; a), as the product 
over examinees of factors like (5): 

L(X I p, T, :t, a) = n P(Xi I P, T, n, a). (6) 

i 

We refer to P, T, x, and a as the structural parameters of the problem. Their number 
remains constant irrespective of N. The incidental parameters 9 and ^ whose numbers 
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increase proportionally as N increases, have been eliminated by marginalizing over their 
respective distributions as in (5). MML estimation proceeds by finding the values of the 
structural parameters that maximize (6). 

Equation (6) is an "incomplete data" likelihood function of tiie form addressed by 
E)empster, Laird, and Rubin (1977) . Estimating the stractural parameters would be 
straightforward if values of 0 and ^ were observed from each examinee along with his or 
her response vector x; this would be a "complete data" problem. The EM algoritiun 
maximizes the incomplete-data likelihood (6) iteratively. The E-step, or expectation step 
of each cycle, calculates die expectations of the sufficient statistics that the complete-data 
problem would require, conditional on the observed data and provisional estimates of the 
stractural parameters. The M-step, or maximization step, solves what looks like a 
complete-data maximum likelihood problem using these conditional expectations of 
sufficient statistics. The resulting maxima for the structural parameters are improved 
estimates of the incomplete-data solution, and serve as input to the next E-step. 

We employ the variation of the EM algorithm used by Bock and Aitkin (1981) to 
estimate item parameters, by Mislevy (1984, 1986) to estimate item parameters and 
population distribution parameters, and by Mislevy and Verhelst (1990) to estimate the 
parameters of mixtures of IRT models. Saltus is in fact a special case of the mixture 
models addressed by Mislevy and Verhelst The integration that appears in (5) is 
approximated by summation over a fixed grid of points. The E-step calculates, for each 
examinee, the conditional probabilities of belonging to each stage, and, within each stage, 
the probabilities tiiat 9 takes the various grid-point values. The grid points play the role 
of weighted pseudo-data points in the M-step. 
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3. 1 Solving the "Complete Data" Problem 

This section gives the ML solution that would obtain if values of 9 and ^ were 

observed for each sampled respondent along with x. Among the N sampled examinees, 
some number distinct values of 6 v.ill have been observed, say 0p . . ., 0q,. . ., ©q. 

Now define the following statistics, is an indicator variable that takes the value 1 if 

Examinee i is in Stage h and has proficiency 0q, and is zero otherwise. Njj is the number 

of examinees observed to be in Stage h: 

i i q (7) 
Nj^q is the number of examinees in Stage h with 8=©q: 

Nhq= X ^ihq- 

i (8) 
Rjjjq is the number of examinees in Stage h with 9=0q who responded correcdy to Item j: 

The complete data likelihood for (p, T, 7C, a) induced by the observation of X, 0, 
and (|> can be written as 

L*(p,T,7C,0 1 X,e,(j)) = n PCNfh 1 7C)n P(Nhq I Hi.a)p P(Rjhq I Nhq,P,T) , 
whence the complete data log likelihood 



r (p,T,7C.a I X,e,(j)) = S Nh log Jih S Nhqlog gh(0q 1 a) 



1 1 bjk {Rjhqlog^jhk(0q)+(Nhq-Rjhq) log[l-^jhk(0q)] ) • 

j k (10) 

ML estimation for the complete data problem proceeds by solving the likelihood 
equations, which are obtained by setting to zero the first derivatives of (10) with respect 
to each element of (P, T, x, a). 
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For elements of x, one must impose the constraint that ^^1^=1- This can be 
accomplished with a Lagrangian multiplier (e.g., Mislevy, 1984, 369-370). One then 
obtains a closed form solution for the proportion of examinees in each stage: 

Jth = Nh/N. (11) 
For elements of a, the likelihood equations are 

^ = £5;N^^?i^=0. (12) 

h ^ 

A nonparametric ML estimate of gj^, for example, estimates the density at each point @^ 
by the proportion of examinees from Stage q observed to have that proficiency: 

©hq = Nhq/Nh (13) 
If normal distributions are assumed, their means are estimated as 

?h = Nh^Z®qNhq. 

q (14) 

If each normal distribution can have a different variance, then 

S = Nh'X(©q-^ih)2Nhq; 

q (15) 

if all are assumed to have the same variance, then 

S^ = N-iXX(0q-Hh)^Nhq. 

h q (16) 

Even in the complete data problem, closed form solutions for P and T are not 

forthcoming. They can be estimated togetiier witiiout heavy calculation, however, using 

Newton steps for each element From a provisional estimate of a generic element z, an 



improved estimate is obtained as 
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For elements of P, the constraint that 2Pj=0 must be taken into account. Defining 

n-l 

Pn = -IPj 

j=l 

we obtain the required first and second derivatives shown below. For Item j, for j=I, . . . , 
n-l, 

oft q h k 



and 



^4 = -X X NhqS bjk^jhkOqXl-^jH^Qq)] + bric^nli(0t,Xl-^T«c(0^)] • (18) 

3pf q h k 



For Saltus parameter Xj^, for h=2, . . . , H and k=2, ... , H, 



= XZbjk[Rjhq-NhqYjhk(eq)j 



q j (19) 



and 



= -S NhqS bJk^jH:(e^^l-yjH^^] . (20) 
^hk q j 



Note that the summations over j in (19) and (20), which include the factor bjj^., serve 

merely to pick up terms for only those items in item class k. 

Solving the likelihood equations for p and T requires provisional estimates of 
each to calculate the terms that appear in (17) - (20). Once they are computed, a 

Newton step is taken for each element in P and T to provide improved estimates. These 
are used again to calculate improved estimates of the ¥s for the next Newton step. This 
procedure ignores the cross second derivatives among the elements of P and T, but, from 
good starting values, converges rapidly nonetheless. 
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3.2 Solving the Incomplete Data Problem 

We make tiie simplifying assumption tiiat 6 parameters can take only Q possible 
values, namely ©i, . . . , 0q. These values will play the role of the observed values 9^ 

discussed in the preceding section. In any actual application of the Salms model, neither 
tiie values of nor are knov^-n, so neither will be the values of the indicator variables 

ly^q. If the values of tiie structural parameters P, T, TC, and a were known, however, it 
would be possible to calculate the expected values of the l^^s given x^s: 
l2«i=E(l2xilxi,p,T,JC,a) 

TChgh(Qqla)Ph(Xiieq, p,T) ^21) 
X Ilk X gkCOql a ) Pk(Xil0r. P, T) ■ 

k r 

In the E-step of the EM approach to maximizing the marginal likelihood function 

(6), one evaluates (21) using provisional estimates of P, T, n, and a. From 

these, one obtains expectations of the summary statistics defined in (7) - (9); call them 
Njj, Nj^q, and Rjj^q. Note that tiie 0q values play the role that observed 9 values played 

in the complete data solution. Now, however, rather than observed counts of examinees 

at such a point, we have expected values of those counts. 

In tiie M-step, one uses Nj^, Nj^^, and Rjjjq in place of tiieir observed counterparts 

to solve facsimiles of the complete data likelihood equations via (1 1) - (20). Cycles of E- 
and M-steps are continued until successive changes are suitably small. Because the EM 
algorithm can be slow to converge, accelerating methods such as Ramsay's (1975) may 
be employed. 

Equation (21) will be recognized as an application of Bayes theorem, giving the 
posterior probability that 9p0q and ^•^-'i^ after observing Xj. The nomializing constant 

o 18 
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in the denominator is an approximation of p(xj) as given in (5). During the E-step, one 
may therefore accumulate the sum -2 Z log p(x^) to track the performance of 

improvement in fit over cycles, or to compare the fit of various values of structural 
pai ameters. For example, one can evaluate the impact of setting a particular Saltus 
parameter to zero, or compare a normal solution with equal variances in all stages against 
a solution that permits different variances. 



3.3 Approximating the Information Matrix 

Under the grid-point approximation described above, a method described by 
Louis (1982, Section 3.2) provides an approximation of the observed information matrix 
for MML estimates of the structural parameters in the Saltus model. For brevity, denote 
the parameter (P, T, tc, a ) by T]. Louis' approximation is a sum over subjects of cross- 
products of expected complete-data log likelihood first derivatives: 



I(T1) = X 



V V 9x*('nixij[ihi=i) = 
2- 2w ^ 



y y aX*(TllXiJihi=l) ^ 



JL h <i 



The required terms for p and T are simplified versions of (17) and (19) respectively: 

ar(TiixiJihq^i) ^ ^^e^^.^ . [^^e^xiii 

oft 



and 



Incorporating the constraint that the tc's must sum to one, we obtain for tcj,, for h=l, . . ., 
H-1, 

3X*(nlx,Ja,=l) ^^,.^ 
tmh 
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For means and variances in the normal solution, 

dX\n\x Ji^=l) 

and 

aX,*(TllxiJa^=l) ^ (€^-Mh)^-<^ 

If the observed information matrix is positive definite and the solution is the 
global pij^^imiim of the likelihood, its inverse is a large-sample approximation of the 
sampling variance of the MML estimates. In particular, square roots of the diagonal 
entries of I'^ are large-sample standard errors. 

In addition to indicating the precision with which structural parameters have been 
estimated, the observed information matrix contributes to an understanding of the 
identification status of the model. As noted above, resolving the scale indeterminacies is 
necessary but not sufficient for identification. Another necessary condition is that the 
true information matrix be positive definite. Since the observed information matrix is a 
consistent estimate of the information matrix, a positive defimite observed information 
matrix is supportive evidence of local identification. That is, in the neighborhood of the 
MML estimates, changes in parameter values imply changes in modelled response 
probabilities. The reader is referred to McHugh (1956) and Goodman (1974) for 
additional discussion of these issues in the closely-related context of latent class analysis. 

3.4 Starting Values 

The closer starting values are to final estimates, the fewer EM cycles will be 
required. Good starting values for the Saltus model can be based on Wilson's (1989) 
approximate estimation procedures. Modified slightiy to conform to the identifying 
constraints specified in this presentation, the required steps are as follows. 
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Assign each examinee to a stage based on his observed response pattern. This 
will be straightforward in those cases in which successive stages imply greater 
probabilities of correct response to all items; total scores then identify "most 
likely" values of stage membership. In other cases, however, total scores will not 
suffice— as when moving to a higher stage means higher probabilities of success 
for some item classes, but lower probabilities for classes of items formerly 
answered correctiy for the wrong reasons. Here provisional assignments for some 
examinees will depend on their relative successes in contrasting item classes. If it 
is stiU not possible to identify a most likely stage from among two or more 
possibilities, assign the examinee to one of them at random. 

Use as initial estimates of ic the proportions of examinees provisionally assigned 
to the stages. If no examinees have been assigned to a stage, use a small value 
such as .25/H as the starting value for that stage and adjust other probabilities 
accordingly. 

Obtain estimates of item and person parameters under the simple Rasch model 
independently for each stage, using only the examinees provisionally assigned to 
that stage. If an item has a zero or perfect score, assign it a logit value based on 
Cohen's (1979) approximation for an item with a score of 1 or 1 less than the 
Tnaximiitn score, respectively. Linearly transform the results so that 

a. the item parameter estimates for Stage 1 are centered at zero, and 

b. the average item difficulty for item Class 1 takes the same value in all 
stage calibrations. 

Use as starting values for P the item parameter estimates from the Stage 1 
calibration run. 
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5. To calculate starting values for CX, use person ability estimates from each stage's 
calibration run, rescaled by the linear transformations applied to item difficulties 
applied in Step 3 above. For example, if normal distributions have been posited, 
calculate the mean and standard deviation of rescaled 9 's of the examinees 
provisionally assigned to each stage. 

6. Calculate the average item difficulty in each item Class k in each rescaled 
calibration run h, denoting tiie results Pjj^^. Use as starting values for T tiie values 

■^Wc = Phk - Pik > h=2, H; k=2, H. 
If additional constraints have been posited among t's, appropriate averages or 
contrasts of the values so obtained may be used. 

4.0 Empirical Bayes Estunates of Examinee Parameters 

Once final estimates of structural parameters have been obtained, posterior 

probabilities of stage membership can be calculated for any examinee, and 6 can be 

estimated conditional on stage membership. One begins by evaluating the expectations 
of the indicator variables as shown in (21), using the MML estimates of P, T, JC, and 

OU For a response vector Xi, the empirical Bayes approximation of probability of 
membership in Stage h is given as 

P((|>ih=llXi)-Xlihq • (22) 
q 

Conditional on membership in Stage h, the posterior expectation of 9 is approximated as 

9ih = E(9 I <})ih=l,Xi) = X ©cJihq/S lihq. (23) 

q q 

and the posterior variance is 

VaT(9 I <})ih=l,Xi) = (X eqlihq- lihqVS ^ihq • (^4) 

q q q 
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5.0 Example 1: Simulated Data 

This section describes a modest simulation comparing the performance of the 
MML algorithm with a solution trieating examinees' stage memberships as if they were 
known tru3 parameter values. Wilson's (1984) original approximations were based on a 
joint iTi^^vim^im likelihood (JML) estimation algorithm, and proceeded by first using an 
auxiliary algorithm to place each person into one or the other of the Saltus stages. This 
classification was not altered in the course of the algorithm. Under these circumstances, 
there is no mixture present, so the model is considerably simplified. The approach was 
found to give poor results imder even generous conditions, and Wilson devised a 
correction based on "tailored simulations" to bring the estiimates of the Saltus parameters 
closer to generating values. This was not a very satisfactory situation, and, in part, 
motivated this paper. In this simulation, we use an MML algorithm rather than a JML 
algorithm to estimate the remaining item and examinee-group parameters, to focus the 
comparison on the way examiiiiee group membership is handled- In addition we judged 
that "tailored simulation", although somewhat efficacious in the previous work, should 
not be a part f the comparison. It is a complex and time-consuming process that few 
analysts would perform in practice. 

Two-class Saltus item-response data were generated in a 2x2 design, based on the 
following two factors: 

• The number of items in each Saltus class: moderate (10) or small (4). One would 
expect more difficulty recovering parameters with the smaller number of items, 
because less information is available about examinees' stage memberships. 

♦ The value of the discontinuity parameter t22: moderate (1.5) or small (0.5). One 
would expect the smaller discontinuity value to cause more difficulty in parameter 
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recovery, again because classification of examinees according to stage 

membership is more problematic. 
Each condition was replicated ten times, with 500 simulees drawn from each of two 
normally-distributed examinee stage groups, with means of -1.5 and 0.5 and standard 
deviations if .25. Saltus parameters were estimated for each replication under both the 
MML approach with a normal distribution and the "$ as <])" approach. 

Table 1 gives the generating values and the averages of the parameter estimates 
over the ten replications for the 10-items-per-class conditions, for both the moderate and 
small discontinuity conditions. There were ten items in each of two Saltus levels (items 1 
to 10 and 1 1 to 20, respectively), with difficulties uniformly spread from -1.5 to 1.5. 

Insert Table 1 about here 

Consider first the combination of conditions that was expected to provide the best 
results, namely moderate number of items and moderate discontinuity. For the mixture 
model algorithm (column 3), the item parameters have been estimated quite well and the 
size of the Saltus stage groups is quite accurate, but the Saltus parametei has been 
underestimated by 0. 1 1, or about 7 to 8 percent of its value. The ability distributions 
have been recaptured well. The as <[)" approach (column 4), estimates item difficulties 
in the right order, but inflated away from zero. The Saltus parameter is overestimated by 
almost 300 percent, although the proportional representation of the Saltus stage groups is 
about right The mean of the lower group is over a half a logit above its generating value, 
and its standard deviation is somewhat larger than it should be. The second stage's mean 
is well-estimated, and its standard deviation is also too large. Wilson's "tailored 
simulations" would have reduced the overestimation of the Saltus parameter, but would 
not have addressed any of the other problems. 
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The fifth column of Table 1 shows MML results for the small discontinuity 
condition. Compared to the moderate discontinuity condition, the item parameters are 
slightly deflated towards zero, and the size of the Stage 1 group has been estimated as .54 
rather than .50. The Salms parameter has again been underestimated, this time by 28 
percent of its generating value. The stage means have both been overestimated 
somewhat, but their standard deviations behaved differentiy: the first is about twice as 
large as the generating value, while the second is only half as large. Column 6 contains 
the results for the as <>" anproach. Here tiie item difficulties are inflated away from 
zero to about the same extent that the mixture model estimates were deflated back 
towards zero, and the size of Stage 1 group has been estimated as .56 rather than .50. 
Once again the Salms parameter is greatiy overestimated, this time by 500 percent Botii 
stage means have shrunk towards zero considerably, and both standard deviations are 
inflated, altiiough to different degrees. 

Table 2 presents generating values and results for the 4-items-per-class 
conditions. Among MML estimates (column 3), the item parameters have been estimated 
quite well and the size of the Saltus stage groups is quite accurate, but the Salms 
parameter has again been underestimated, by about 10 percent The ability distributions 
have been recaptured fairly well, although the standard deviation of the Stage 2 group is 
underestimated The "$ as <)" approach (column 4) shows an entirely different picture. 
The item difficulties are in the right order, but all are inflated away from zero somewhat 
The Salms parameter is overestimated by ahmost 200 percent, and the size of the Stage 1 
group is overestimated. The mean of this lower group is almost logit above its generating 
value while the Stage 1 group's mean is less than it should be. Both standard deviations 
are overestimated. 

liisert Table 2 about here 
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The fifth column of Table 2 shows the MML results for the small discontinuity 
condition. Compared to the moderate discontinuity condition, the item parameters have 
been deflated towards zero, and the size of Stage 1 group has been overestimated even 
more. The Saltus parameter has again been underestimated — essentially as zero. The 
stage group means have both been overestimated again, but ♦heir standard deviations have 
behaved differei, ly: the first is about twice as large as the generating value, the second is 
about half as large. Column 6 contains the corresponding "$ as <1>" results. Here the item 
difficulties are slightiy inflated away from zero, and the size of the Stage 1 group has 
been considerably overestimated. Once again the Saltus parameter is greatly 
overestimated, this thne by 300 percent Both stage group means have been reduced 
towards a common value, while both standard deviations are inflated. 

In summary, the most salient of the results from the simulations are as follows: 

1 . Under the moderate number of items condition, and the moderate discontinuity 
condition, MML gives very good parameter recovery, with the exception of an 
underestimate of the Saltus parameter of an order somewhat less than 10 percent. 

2. Under the mixed conditions (i.e., the "better" condition for one factor, and the 
"poorer" condition for the other), the mixture model gives good parameter 
recovery. 

3. Under the small number of items condition and the small discontinuity condition, 
the mixture model condition gives a noticeably poorer estimation of several 
parameters, especially the Saltus parameter. 

4. The "$ as <1>" approach gives uniformly poor estunates for the Saltus parameter, 
invariably overestimating it. The other parameters foUow roughly the same 
relative patterns as for the MML results, although they are wor. e in almost all 
cases. 
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6.0 Example 2: Mixed Number Subtraction 

The data analyzed in this example are responses of 325 junior high school 
students to 20 open-ended items dealing with mixed-number subtraction, gathered by 
Kikumi Tatsuoka and her colleagues. More detailed descriptions of the data and 
extensive cognitive analyses of the domain can be found in Klein, Birenbaum, Standiford, 
and Tatsuoka (1981), and an analysis based on Tatsuoka's "rule-space" approach appears 
in Tatsuoka (1990). We neglect many aspects of this rich data set in the following 
example, in order to illustrate how the Saltus model captures a key feature of in the 
domain: increasing competence possesses both qualitative and quantitative aspects, as 
learners master procedures and become more proficient in applying them. We contrast 
the Saltus solution with an analysis based on the RM shown as (1) and the 2-parameter 
logistic item response model: 

P(xj=lie,aj,pj) = T[aj(e-pj)], 
where oj, the item slope parameter, indicates the sensitivity to which the probability of a 
correct response to item j reacts to changes in 0. Items with high values of Oj are 
considered to be good at discrinainating high from low competence, from the perspective 
of the 2PL. 

Table 3 presents the text of the items, percents-correct, and item parameter 
estimates under the RM and 2PL. These item parameters were obtained with Mlsilevy 
and Bock's (1989) PC BILOG program, assuming a normal distribution for 9 and setting 
the scale so that the arithmetic mean of the estimated Ps was 0 and the geometric mean of 
the as was 1. Because we renumbered the items in order to group them in Saltus classes, 
the original Klein et al. item numbers are also shown. The item classes are based on 
whether an item requires two key procedures for its solution: finding a common 
denominator, and converting between mixed numbers and improper fractions. Items in 
Class 1 require neither, items in Class 2 require finding a common denominator, items in 
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Class 3 require converting, and possibly finding a common denominator as well. This 
implies that the qualitative aspect of students' is signaled by acquiring the common- 
denominator skill, then the converting skill. This path of development is not necessary 
eitiier logically or psychologically, but it is not unreasonable to posit in this example 
because it accords with the instructional sequence. 

Insert Table 3 about here 

There is a clear pattern in the percentages of correct response. The items in each 
item class are of similar difficulty, and the average difficulties increase from the first 
class, to the second, to the third, with average percents correct of .73, .55, and .34. The 
RM difficulty parameters reflect this pattern directly, since they are nearly linear 
transformation of logits. The RM of the probabilities would suggest increasing 
competence to take the form of uniformly increasing chances of correct response on ail 
items, in the logit metric. The 2PL would also posit linear increases in items' logits of 
correct response, but allow for faster or slower rates from one item to another, in 
proportion to their a parameters. Note the systematically higher 2PL slopes for the Qass 
2 and Class 3 items. The 2PL represents a substantially better fit to the actual response 
data, improving BELOG's chi-square index of comparative fit by 416 at the cost of 20 
additional parameters (i.e., slopes). 

Tables 4 through 6 present the results of the MML Saltus analysis, with normal 
distributions fitted within developmental stages. The Saltus solution offers a slightiy 
greater improvement over the RM than does the 2PL — 449 chi-square units at the cost of 
12 additional parameters (4 xs, 3 means and standard deviations, and 2 independent 
proportions). The Saltus ps in Table 4 are item difficulty parameters for examinees in 
Stage 1. They arc more spread out than those of the RM, indicating that for these 
examinees, exhibiting a large gap between the items in Class 1 and the items in Classes 2 
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and 3. The gap closes considerably when we look at the difficulty estimates that pertain 
to Stage 2 examinees; Class 2 items become just as easy for these students as Class 1 
items. The shift is by the amount of the T22 parameter in Table 5. Qass 3 items still 
remain relatively difficult for Stage 2 examinees. The discontinuity associated with 
examinees in Stage 3 is the drop in difficulty of Class 3 items. 

Insert Tables 4-6 about here 

In addition to the shifts in relative item difficulties, the developmental stages arc 
also distinguished in terms of their 6 distributions (noting, of course, that 8 has a different 
meaning for each stage, in terms of its implications for success on items from different 
classes). Figure 1 illustrates the relative locations of item difficulties and examinee 
distributions for the three stages. The locations of the Class 1 items set the scale; they are 
identical across the three panels. Being in Stage 1 typically implies middling chances of 
answering Qass 1 items correctly, and practically no chance at Class 2 or 3 items. The 
Stage 2 line shows a noticeably higher 9 distribution and a marked drop in the relative 
difficulty of Qass 2 items. The Stage 3 line shows a slighdy higher 0 distribution and a 
marked drop in the relative difficulties of Class 3 items. These patterns are reflected in 
Table 7, which combines stage means with item parameters to give typical probabilities 
of correct response to each item finom examinees of different classes. 

Insert Figure 1 and Table 7 about here 

Table 8 further details the discontinuities that Saltus can accomodate by showing 
observed responses and modeled probabilities for five examinees. We see that. . . 
• Examinee 4 got only half the items right, in a pattern spread across item classes. 
The RM and the 2PL accomodate this pattern well. Saltus handles it with a 
posterior concentrated on Stage 3, with a low 0 value. There are enough Class 2 
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and Class 3 items correct to believe the student is beginning to use common 
denominator and converting procedures, but is not working with accuracy and 
consistency; this concords with missing two of the six easy Class 1 items. 

• Examinee 7 got half the Class 1 items right, three of the Class 2 items, and none 
of the Qass 3 items. From the point of view of the RM and 2PL, some correct 
Class 3 responses would be expected. Saltus Stage 2 accords well with pattern, 
accounting for a dropoff between Class 2 and Qass 3 items for students at this 
stage. 

• Examinee 12 got two Class 1 items right, one Class 2 item, and no Qass 3 items. 
All models and all stages within Saltus agree in the predictions about the Class 1 
items, but Saltiis Stage 1 accords with this pattern best. For a student low in Class 
1, correct answers to Class 2 and Class 3 items would be more rare than the RM 
or 2PL would predict 

• Examinee 18 answered all Class 1 and Class 2 items correctly, but only three 
Qass 3 items. This is a prototypical example of a Saltus Stage 2 pattern. For a 
student widi this many correct responses, the RM and 2PL predict relatively fewer 
successes on Class 1 and 2 items, and relatively more successes on Class 3 items. 

• Examinee 536 also has Stage 2 as most probable stage under Saltus with a 
posterior probability of .67. There is an appreciable .33 probability for Stage 3, 
however, since half of the Class 3 items were answered correctly. 

In this example, the improvements of fit over the Rasch model offered by both the 
2PL and Saltus clearly indicate that there is more going on in the data than the RM can 
capture. The Saltus approach the potential role of theories about learning in the domain 
to provide inferences about the nature of students' competencies. 
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7.0 Conclusion 

This paper has described a marginal maximum likelihood (MML) estimation 
algorithm for Wilson's (1984, 1989) Saltus model. The algorithm's performance was 
compared with that of joint maximum likelihood (JML), in which estimates of subjects' 
unobservable Saltus group memberships based on their total scores are treated as known. 
Substantial improvements were observed for tests of moderate length (10 items per class) 
and short length (4 items per class), in which misclassification of subjects is most likely 
to occur. Biases in estimates of structural parameters were eliminated almost competely 
for the moderate-length test, but not for the short test. In addition to reducing estimation 
biases, MML provides standard errors for item and Saltus parameter estimates that 
appropriately incorporate uncertainty due to imperfect information about examinees' 
Saltus group memberships. 
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Table 1 

Generating Values and Estimates for the Moderate Number-of-Items Condition 



t22=l-5 t22=0-5 





Generating 


Marginal 


Solution 


Marginal 


Solution 


ameter 


Values 


Solution 


treating 
(pas<t> 


Solution 


treating 
<pas<t) 


Pi 


-1.50 


-1.52 


-2.25 


-1.45 


-1.89 


P2 


-1.40 


-1.37 


-2.15 


-1.38 


-1.86 


P3 


-1.30 


-1.32 


-2.11 


-1.29 


-1.79 


P4 


-1.20 


-1.20 


-2.02 


-1.16 


-1.68 


Ps 


-1.10 


-1.08 


-1.92 


-1.06 


-1.60 


P6 


-1.00 


-0.98 


-1.84 


-0.87 


-1.44 


P7 


-0.90 


-0.92 


-1.78 


-0.90 


-1.47 


Ps 


-0.80 


-0.74 


-1.64 


-0.74 


-1.33 


P9 


-0.60 


-0.58 


-1.51 


-0.57 


-1.20 


O 

PIO 


-0.50 


-0.43 


-1.38 


-0.42 


-1.07 


Pll 


0.50 


0.44 


1.09 


0.45 


0.94 


o 

P12 


0.60 


0.59 


1.30 


0.57 


1.07 


Pl3 


0.80 


0.79 


1.56 


0.75 


1.26 


Pl4 


0.90 


0.85 


1.65 


0.83 


1.35 


o 

P15 


1.00 


0.97 


1.82 


1.00 


1.54 


O 

P16 


1.10 


1.10 


1.99 


1.08 


1.63 


Pl7 


1.20 


1.19 


2.13 


1.14 


1.70 


o 

P18 


1.30 


1.32 


2.27 


1.27 


1.86 


Pl9 


1.40 


1.39 


2.34 


1.34 


1.94 


P20 


1.50 


1.50 


2.45 


1.43 


2.06 


122 




1.39 


4.37 


0.36 


2.44 


1^1 


0.50 


0.50 


0.51 


0.54 


0.56 


Jt2 


0.50 


0.50 


0.49 


0.46 


0.44 




-1.50 


-1.54 


-0.91 


-1.37 


-0.80 




0.50 


0.60 


0.49 


0.66 


-0.27 




0.25 


0.25 


0.40 


0.51 


0.87 


02 


0.25 


0.21 


0.43 


0.13 


0.45 



Table 2 

Generating Values and Estimates for the Small Number-of-Items Condition 







"^22= 


=1.5 


T22=0.5 




Generating 


Marginal 


Solution 


Marginal 


Solution 


Parameter 


Values 


Solution 


treating 
$ as (|> 


Solution 


treating 
$as6 


Pi 


-1.50 


-1.45 


-1.72 


-1.37 


-1.64 


P2 


-1.20 


-1.19 


-1.46 


-1.07 


-1.38 


P3 


-1.00 


-0.98 


-1.27 


-0.84 


-1.17 


P4 


-0.50 


-0.45 


-0.80 


-0.29 


-0.70 


P5 


0.50 


0.49 


0.86 


0.37 


0.72 


P6 


1.00 


0.94 


1.24 


0.83 


1.16 


P7 


1.20 


1.18 


1.45 


0.99 


1.32 


I u 


1.50 


1.46 


1.70 


1.38 


1.70 






1.38 


2.95 


-0.09 


1.55 


III 


0.50 


0.51 


0.55 


0.59 


0.63 




0.50 


0.50 


0.45 


0.41 


0.37 


^ll 


-1.50 


-1.46 


-0.61 


-1.21 


-0.64 


V-l 


0.50 


0.58 


-0.21 


1.09 


-0.06 




0.25 


0.24 


0.76 


0.47 


0.77 


02 


0.25 


0.10 


0.48 


0.08 


0.39 



Table 3 

Item Text, Percents-Coirect, and Saltus Difficulty Parameter Estimates 





Tatsuoka 




Percent 


RM 


2PL 


2PL 


Item 


Item# 


Text 


Correct 


Difficulty 


Difficulty 


Slope 


Saltus Class 1 Items 












1 


6 


6 4 _ 

7~T- 


.79 


-1.36 


-1.46 


.77 


2 


8 




.71 


-.92 


-1.23 


.44 


3 


9 


3j-2 = 


.69 


-.86 


-3.97 


.12 


4 


12 




.71 


-.94 


-.97 


.65 


5 


14 


3f-3| = 


.75 


-1.16 


-1.10 


.85 


6 


16 


4f-l4 = 


.74 


-1.09 


-1.05 


.81 


Saltus Class 2 Items 


5 3 _ 
3 4 ~ 










7 


1 


.50 


-.04 


.29 


1.04 


- 8 


2 


3 _ 3 _ 

4 8 ~ 


.56 


-.31 


.06 


1.68 


9 


3 


6 9 


.51 


-.05 


.31 


1.36 


10 


5 


^5 "^10 


.61 


-.51 


-.89 


.27 


Saltus Class 3 Items 












11 


4 


.37 


.54 


.86 


1.96 


12 


7 


3-2i = 


.33 


.76 


1.10 


.98 


13 


10 


4^-24= 


.31 


.84 


1.08 


2.28 


14 


11 


4i-2f = 


.37 


.56 


.89 


1.25 


15 


13 


3|-2| = 


.31 


.82 


1.10 


4.58 


16 


15 


Z 3 


.38 


.49 


.84 


1.08 


17 


17 


73 4 _ 
'5 5 ~ 


.34 


.69 


1.02 


l.i5 


18 


18 


4J 24 = 

^10 


.41 


.37 


.73 


1.03 


19 


19 


7-l| = 


.26 


1.10 


1.31 


1.75 


20 


20 


4i-lf = 


.31 


.84 


1.11 


1.61 
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Table 4 

Saltus Item Parameter Estimates 



Item g SE(p) 

Saltus Class 1 Items 

1 -2.94 .15 

2 -2.34 .14 

3 -2.26 .14 

4 -2.38 .14 

5 -2.66 .14 

6 -2.57 .14 
Saltus Class 2 Items 

7 0.00 .16 

8 -0.52 .16 

9 -0.02 .16 

10 -0.94 .16 
Saltus Class 3 Items 

11 1.32 .18 

12 1.77 .18 

13 1.97 .18 

14 1.36 .18 

15 1.93 .18 

16 1.20 .18 

17 1.64 .18 

18 0.95 .18 

19 2.51 .19 

20 1.97 .18 



Implied Within-Stage Difficulty 



Stage 1 Stage 2 Stage 3 



-9 Q4 


-9 94 


-9 94 




-9 "^4 


-9 "^4 


-2.26 


-2.26 


-2.26 


-2.38 


-2.38 


-2.38 




.9 


-9 66 


.9 


-9 *>7 


-9 S7 


n no 


-9 8S 


-1 20 


-0 S9 


-3 37 


-1 73 


-0 09 


-9 88 


-1 23 


-0 94 


-3 79 


-2.14 


1 "^9 


0 32 


-1 80 


1.77 


0.77 


-1.36 


1.97 


0.96 


-1.16 


1.36 


0.35 


-1.77 


1.93 


0.93 


-1.19 


1.20 


0.20 


-1.93 


1.64 


0.64 


-1.49 


0.95 


-0.05 


-2.18 


2.51 


1.51 


-0.62 


1.97 


0.96 


-1.16 
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Table 5 

Saltus Parameter Estimates (Standard Errors in Parentheses) 



Item Qass 




Examinee Stage 




1 


2 


3 


1 


0.00* 


0.00* 


0.00* 


2 


0.00* 


2.85 (0.20) 


1.21 (0.13) 


3 


0.00* 


1.00 ( 0.09) 


3.13 (0.08) 


* Fixed at zexx> for model identification. 
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Table 6 

Saltus Examinee-Stage Estimates 



Parameter 


Stage 1 


Stage 2 


Stages 




0.45 


0.25 


0.31 




-2.27 


-0.77 


-0.44 


(J 


0.68 


0.90 


0.85 
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Table? 

Modelled Average Percent-Correct for Saltus Classes 



Item 


Stage 1 


Stage 2 


Stage 3 


Saltus Class 1 Items 








1 


0.66 


0.90 


0.92 


2 


0.52 


0.83 


0.87 


3 


0.50 


0.82 


0.86 


4 


0.53 


0.83 


0.87 


5 


0.60 


0.87 


0.90 


6 


0.57 


0.86 


0.89 


Average 


0^6 


0.85 


0.89 


Saltus Class 2 Items 








7 


0.09 


0.89 


0.68 


8 


0.15 


0.93 


0.78 


9 


0.10 


0.89 


0.69 


10 


0.21 


0.95 


0.85 


Average 


0.14 


0.92 


0.75 


5a/ms Class 3 Items 








11 


0.03 


0.25 


0.80 


12 


0.02 


0.18 


0.71 


13 


0.01 


0.15 


0.67 


14 


0.03 


0.25 


0.79 


15 


0.01 


0.16 


0.68 


16 


0.03 


0.28 


0.82 


17 


0.02 


0.20 


0.74 


18 


0.04 


0.33 


0.85 


19 


0.01 


0.09 


0.54 


20 


0.01 


0.15 


0.67 


Average 


0.02 


0.20 


0.73 
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Figure 1 

Modelled Saltus Item Locations and Class Membership Distributions 
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